Applications of Regularized Least Squares to Classification Problems

نویسنده

  • Nicolò Cesa-Bianchi
چکیده

We present a survey of recent results concerning the theoretical and empirical performance of algorithms for learning regularized least-squares classifiers. The behavior of these family of learning algorithms is analyzed in both the statistical and the worst-case (individual sequence) data-generating models. 1 Regularized Least-Squares for Classification In the pattern classification problem, some unknown source is supposed to generate a sequence x1,x2, . . . of instances (data elements) xt ∈ X , where X is usually taken to be R for some fixed d. Each instance xt is associated with a class label yt ∈ Y, where Y is a finite set of classes, indicating a certain semantic property of the instance. For instance, in a handwritten digit recognition task, xt is the digitalized image of a handwritten digit and its label yt ∈ {0, 1, . . . , 9} is the corresponding numeral. A learning algorithm for pattern classification uses a set of training examples, that is pairs (xt, yt), to build a classifier f : X → Y that predicts, as accurately as possible, the labels of any further instance generated by the source. As training data are usually labelled by hand, the performance of a learning algorithm is measured in terms of its ability to trade-off predictive power with amount training data. Due to the recent success of kernel methods (see, e.g., [8]), linear-threshold (L-T) classifiers of the form f(x) = sgn(w x), where w ∈ R is the classifier parameter and sgn(·) ∈ {−1,+1} is the signum function, have become one of the most popular approaches for solving binary classification problems (where the label set is Y = {−1,+1}). In this paper, we focus on a specific family of algorithms for learning L-T classifiers based on the solution of a regularized least-squares problem. The basic algorithm within this family is the second-order Perceptron algorithm [2]. This algorithm works incrementally: each time a new training example (xt, yt) is obtained, a prediction ŷt = sgn(w t xt) for the label yt of xt is computed, where wt is the parameter of the current L-T classifier. If ŷt = yt, then wt is updated and a new L-T classifier, with parameter wt+1, is generated. The second-order Perceptron starts with the constant L-T classifier w1 = (0, . . . , 0) and, at time step t, computes wt+1 = (aI + St S t ) −1St yt, where a > 0 is a parameter, I is the identity matrix, and St is the matrix whose columns are the S. Ben-David, J. Case, A. Maruoka (Eds.): ALT 2004, LNAI 3244, pp. 14–18, 2004. c © Springer-Verlag Berlin Heidelberg 2004 Applications of Regularized Least Squares to Classification Problems 15 previously stored instances xs; that is, those instances xs (1 ≤ s ≤ t) such that sgn(w s xs) = ys. Finally, yt is the vector of labels of the stored instances. Note that, given St−1, yt−1, and (xt, yt), the parameter wt+1 can be computed in time Θ(d2). Working in dual variables (as when kernels are used), the update time becomes quadratic in the number of stored instances. The connection with regularized least-squares is revealed by the identity wt+1 = arginf v∈Rd (

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Applications of regularized least squares to pattern classification

We survey a number of recent results concerning the behaviour of algorithms for learning classifiers based on the solution of a regularized least-squares problem. c © 2007 Elsevier B.V. All rights reserved.

متن کامل

A coordinate gradient descent method for ℓ1-regularized convex minimization

In applications such as signal processing and statistics, many problems involve finding sparse solutions to under-determined linear systems of equations. These problems can be formulated as a structured nonsmooth optimization problems, i.e., the problem of minimizing `1-regularized linear least squares problems. In this paper, we propose a block coordinate gradient descent method (abbreviated a...

متن کامل

Regularized Total Least Squares: Computational Aspects and Error Bounds

For solving linear ill-posed problems regularization methods are required when the right hand side and the operator are with some noise. In the present paper regularized approximations are obtained by regularized total least squares and dual regularized total least squares. We discuss computational aspects and provide order optimal error bounds that characterize the accuracy of the regularized ...

متن کامل

Regularized Total Least Squares Based on Quadratic Eigenvalue Problem Solvers

This paper presents a new computational approach for solving the Regularized Total Least Squares problem. The problem is formulated by adding a quadratic constraint to the Total Least Square minimization problem. Starting from the fact that a quadratically constrained Least Squares problem can be solved via a quadratic eigenvalue problem, an iterative procedure for solving the regularized Total...

متن کامل

Regularized Discriminant Analysis, Ridge Regression and Beyond

Fisher linear discriminant analysis (FDA) and its kernel extension—kernel discriminant analysis (KDA)—are well known methods that consider dimensionality reduction and classification jointly. While widely deployed in practical problems, there are still unresolved issues surrounding their efficient implementation and their relationship with least mean squares procedures. In this paper we address...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004